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Abstract 

This paper considers sequential adaptive estimation of sparse signals under a constraint on the total 
sensing effort. The advantage of adaptivity in this context is the ability to focus more resources on regions 
of space where signal components exist, thereby improving performance. A dynamic programming 
formulation is derived for the allocation of sensing effort to minimize the expected estimation loss. 
Based on the method of open-loop feedback control, allocation policies are then developed for a variety 
of loss functions. The policies are optimal in the two-stage case and improve monotonically thereafter 
with the number of stages. Numerical simulations show gains up to several dB as compared to recently 
proposed adaptive methods, and dramatic gains approaching the oracle limit compared to non-adaptive 
estimation. An application to radar imaging is also presented. 

Index Terms 

Adaptive sensing, adaptive sampling, resource allocation, sparse signals, dynamic programming. 

I. Introduction 

Adaptive sensing and inference have been gaining interest in recent years in signal processing and 
related fields. Potentially substantial gains in performance can be achieved when observations are made 
sequentially and adaptively, making use of information derived from previous observations. This work 
focuses on sparse signals, i.e., signals that occupy a small number of dimensions in an ambient space. It 
is now well-known that compressed sensing offers an efficient non-adaptive strategy for acquiring sparse 
signals, relying on a relatively small number of observations that are incoherent with the basis in which 
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the signal is sparse (see e.g. H], O). However, when noise is present and sensing resources are limited, 
incoherent observations may not be the most efficient since a large fraction of the resources are allocated 
to dimensions where the signal is absent. Alternatively, by allocating resources according to estimates 
of the signal support obtained from past observations, better signal-to-noise ratios (SNR) are possible. 
Applications in which adaptive sensing of sparse signals can be readily utilized include surveillance using 
active radars 0, (H, spectrum sensing in cognitive radio 121, 0, and gene association and expression 
studies Q. 

Existing methods for adaptive sensing of sparse signals can be roughly grouped around two classes of 
models. In the first class, which is the focus of this paper, observations are restricted to single components 
in the basis that induces signal sparsity, while resources can be distributed arbitrarily over components 
and observation stages. An optimal two-stage allocation policy was developed in Q for a cost function 
related to bounds on estimation and detection performance. Subsequent developments stemming from 
include a modification to handle non-uniform signal priors JSJ, a simplification based on Lagrangian 
constraint relaxation |9), and a multiscale approach that uses linear combinations in the first stage to 
reduce the number of measurements [4]. Based on a similar model but in a different direction, a method 
known as distilled sensing iflOl was proposed for signal support identification and was shown to be 
asymptotically reliable (as the ambient dimension increases) at SNR levels significantly lower than non- 
adaptive limits. The distilled sensing idea was recently extended to a more general setting of sequential 
multiple hypothesis testing in ifTTj : in |[T2l it is shown that a sequential thresholding procedure comes 
within a small factor of the optimal sequential procedure in terms of the number of observations needed 
for asymptotically exact support recovery. 

In the second class of models, the observations can consist of arbitrary linear combinations, as in 
compressed sensing, but for the most part the resource budget is assumed to be discrete, measured 
in units of normalized observations ( ifTTll . |[T2l also assume a discrete budget). In [13], the distilled 
sensing approach was extended to the compressed measurement setting. In |[T4l . lfT5ll . a Bayesian signal 
model is adopted and each new observation is chosen to approximately maximize the information gain; 
ITT31 is computationally simpler but is most suited to signals with a single non-zero component, i.e., 
1-sparse signals. Others have also taken the approach of decomposing the problem into subproblems 
involving 1-sparse signals and then applying a form of bisection search |fT6ll , |[T7l , lfT8l ; 11181 employs a 
more sophisticated search in which the rate of division accelerates, reducing the complexity to doubly 
logarithmic instead of merely logarithmic. The adaptive methods in ifTTl , |fT8l were shown to require 
fewer measurements than the best non-adaptive method. In |[T6l and |fT8l however, noise is either not 
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considered or not fully taken into account. Somewhat different from the aforementioned works is fi~9l , 
which describes a compressed sensing method that is sequential in the sense that it terminates once the 
reconstruction error is determined to have fallen below a threshold, but the form of the measurements is 
not adapted during the process. 

Adaptive sensing and resource allocation have also been applied to other classes of signals with more 
structure. Tree-structured sparsity is considered in l f20ll . which proposes selective sampling of wavelet 
coefficients based on already sampled coefficients nearby and at coarser resolutions. For two-dimensional 
piecewise-constant signals, a method that concentrates measurements near boundaries is presented and 
analyzed in l2ll . ll22l . Adaptive waveform amplitude design is investigated in |[23l for unstructured 
(i.e. dense) parameter estimation in a linear Gaussian model under an average energy constraint. 

This paper addresses the problem of estimation and adaptive resource allocation under the first observa- 
tion model in which components are measured directly. We extend the two-stage allocation policy in 
to an arbitrary number of stages, focusing on estimation error explicitly as contrasted with performance 
bounds in |3|. Our method is computationally tractable for a wide range of estimation loss functions 
satisfying a mild convexity condition, including such commonly used criteria as mean squared error 
(MSE) and mean absolute error (MAE). The observation model in Q, ifTOl is also generalized by allowing 
the observation precision to depend on an arbitrary concave function of the sensing effort. It is shown 
that the problem can be formulated as a dynamic program, a framework that facilitates the development 
of allocation policies. An approximate dynamic programming solution is proposed based on open-loop 
feedback control (OLFC). The performance of these OLFC policies improves monotonically with the 
number of stages, and in particular improves upon optimal two-stage policies including the one in 0. 
Numerical simulations show error reductions up to 4.5 dB relative to the optimal two-stage policy and 
dramatic reductions relative to non-adaptive sensing, approaching the oracle limit at high SNR. The OLFC 
policies are also shown to outperform distilled sensing [10] at all SNR and most significantly at higher 
SNR. The advantages carry over to a radar imaging example that challenges some of the assumptions of 
our model. 

The remainder of the paper proceeds as follows. In Section JIJ the signal and observation models are 
specified and a problem of resource-constrained sequential estimation is formulated and then recast as a 
dynamic program. In Section [Till optimal and OLFC approaches to the problem are discussed and a family 
of OLFC policies is proposed. Numerical simulations comparing our OLFC policies to other policies are 
presented in Section [IV] In Section |Vj an application to radar imaging is described. Conclusions and 
future directions are given in Section [VT] 
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II. Problem formulation 

We consider signals that are sparse with respect to the standard basis (without loss of generality) in 
an iV-dimensional space. The signal support is represented by a set of indicators Ii, i = 1, . . . , N, with 
Ii = 1 if i is in the support and Ii = otherwise. We use a probabilistic model in which Ii = 1 with prior 
probability pi(0), independently of the other indicators. The non-zero signal amplitudes are modelled as 
independent Gaussian random variables Oi with prior means /Xj(0) and variances of(0). As in 01. fill, 
a non-informative uniform prior is assumed with pi(0) = po, jUj(O) = [Ao, and of (0) = <Tq for all i, 
although the theory developed below could also accommodate non-uniform priors. 

A sequence of T observations are made with non-negative effort levels Aj(i) that can vary with index 
i and time t = 0, . . . ,T — 1. Depending on the application, the effort Xi(t) might represent observation 
time, number of samples, energy, cost, or computation. It is assumed that the precision (inverse variance) 
of an observation varies with effort according to a non-decreasing function h. Specifically, given Aj(i— 1), 
the observation of the ith component at time t takes the form 

n(i) 

y i (t) = I i 6 i + lK) i = l,...,N, t = l,...,T, (1) 

^h(\i(t - 1)) 

where rii{t) represents i.i.d. zero-mean Gaussian noise with variance a 2 . The function h is often linear, but 
nonlinear dependences can also arise. For example, the sensing system may contain nonlinear components 
such as amplifiers, or the observations may result from integrating a continuous-time random process 
over an interval of length Aj(i — 1) and the process exhibits short-term correlation. In any case, it is 
reasonable to assume that h(0) = and h(X) > for A > 0, and to normalize h so that h(l) = 1. 
We restrict our attention to static signals so that the signal component I-fii in £[) does not change with 
time. For convenience, we use the vector notation y(t) = [yi(i) . . . yN{t)] T (similarly for other indexed 
quantities) and denote by Y(t) = {y(l), ■ ■ ■ , y(*)} tne history of observations up to time t. 

The task is to determine the distribution of sensing effort over components and time subject to a total 
budget constraint, 

T-l N 

^^A i (t)=A . (2) 

t=0 i=l 

Under the normalization Ao = N, each component receives an average of one unit of effort over time. In 
the case of single-stage non-adaptive sensing (T = 1) and a uniform prior, the most natural choice is to set 
Aj(0) = 1 for all i. Thus a 2 can be regarded as the noise variance realized under a non-adaptive uniform 
allocation. In multistage adaptive sensing, the allocation \(t) at time t can depend on the observations 
Y(t) collected up to that point. This information allows more resources to be focused on the region of 



5 



signal support, thereby improving the SNR. The mapping from Y(t) to X(t) is referred to as an effort 
allocation policy. For notational brevity, we will not make the dependence of \(t) on Y(t) explicit. 

To select the effort allocation policy, we seek to minimize the sum over the signal support of the 
expected losses associated with estimates 0j of the amplitudes 6>j, based on all observations up to time 
T: 



( N 
E h>L( 

u=i 



04 — 0i 



(3) 



where the expectation is taken over I, 6, and Y(T). The loss function L is assumed to be non-decreasing 
(and symmetric given the form in ([3])). To relate the expected cost ([3]) to the effort allocation policy, we 
nest the expectations in the order Y(T), I, 6 (outer to inner) and expand to yield 



r N 

e y(t) ]>>(t)e[l( 
U=i 



/i = l,Y(T) 



(4) 



where we have defined Pi(t) = Pr(/j = 1 | Y(t)). It is shown in Appendix lAl that Oi\I i = l,Y(t) has 
a Gaussian distribution for all t with mean iii(i) and variance crf(t). Since the Gaussian distribution is 
symmetric and strictly unimodal and the loss function L is also symmetric and non-decreasing, the inner 
expectation in © is minimized by choosing 0{ = Hi{T) (a proof of this fact is given in Appendix |B1). 
Then the minimum value of the inner expectation depends only on of (T) and © can be expressed as 



2E 



Y(T) 



' N 



L (<ji(T)0) 4>(9) d6 



(5) 



where (f)(0) denotes the standard Gaussian probability density function (pdf). The final-stage variance 
crf(T) depends in turn on the effort allocation according to the relation 



(6) 



which is derived in Appendix |A] In summary, the problem is to minimize the expected cost defined by 
© and © with respect to the effort allocation policy A(0), . . . , A(T — 1), subject to the total effort 
constraint ©. 

In the case of the square loss L(a) = a 2 , i.e., the mean squared error (MSE) criterion, the integral in 
© can be evaluated to yield af(T), thus reducing © to 



a 1 E 



Y(T) 



' N 

E 



Pi(T) 



(7) 



The cost function in © is closely related to the cost function in |3] although the motivations differ with 
the latter being related to Chernoff and Cramer-Rao bounds on detection and estimation performance 
respectively. The general form of the cost function in Q can be obtained from © by replacing Pi(T) 
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with the weighted average z/p;(T)+(l— Pi(T)) for v € [1/2, 1], letting a\ — > oo so that a 2 /a 2 — > 0, 
and choosing h to be the identity function. Given that the generalization of pi(T) to a weighted average 
is straightforward to accommodate, we keep v = 1 to simplify notation in the remainder of the paper. 

A. Formulation as a dynamic program 

The determination of an optimal effort allocation policy according to ([5]) and © can be formulated 
as a dynamic program. Although the dynamic programming viewpoint does not offer significant simpli- 
fications, it does make available a well-developed set of approaches to the problem, some of which are 
considered in Section |III] Further background in dynamic programming can be found in [24]. 

To formulate a sequential decision problem as a dynamic program, the cost function must be expressible 
as a sum of terms indexed by time t, where each term depends only on the current system state x(i) 
and the current control action, in our case the effort allocation \(t) (each term may also depend on a 
random disturbance but this is not required here). The cost function © can be recast in the required 
time-separable form by defining the state x(t) as x(i) = (p(i), A*(t), cr 2 (t), A(i)), where A(t) represents 
the effort budget remaining at time t. The state variables are initialized as Pi(0) = po, pi(0) = po, 
of (0) = of, and A(0) = Aq, and evolve according to the following recursions derived in Appendix lAl 



Pi{t + l) 



Pi(t)<f>l 



(8a) 



af(t + l) 



Piit^x + il-piit))^ 
C 7 2 /x i (t) + /t(A i (t))a t 2 (t)^(t + l) 



(8b) 



(8c) 



N 



A(t + 1) 



A(t) -$>(*)> 



(8d) 



i=i 



where 



= 4>{y i (t + l);0,a 2 /h(\ l (t))), 



01 = <KVi(t + 1); m(t), a 2 {t) + a 2 /h{Xi{t))) 



and (/>(•; p, a 2 ) denotes a Gaussian pdf with mean p, and variance a 2 . 

Given the above state definition, we use (f8cl ) to rewrite the denominator in © as 



4 + E MM*)) = 2 ( r n + ^ A ^ r " : ))" 



(9) 
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We then decompose Ey(t) as Ey(T-i) E y ( T )i Y (T-i) an( i notice that only Pi(T) in © depends on y(T). 
Taking the expectation of (l34l) with respect to y(t) \ Y(t — 1), we deduce that 

E y(t) {pi(t) | Y(t-l)}= Pi (t-l), t = l,...,T. (10) 

Using ©, (O, and ([TOl) . the effort allocation problem may be stated as 



min Ey(t-i){G(x(T-1),A(T-1))} 
X(0),...,A(T-1) 

T-l 7Y 
t=0 i=l 



(11) 



where the cost function is of the desired form with a single non-zero term at time T — 1, 

N 

G(x(T - 1), A(T - 1)) = 5>(T - 1) 5 K 2 (T - 1), h(Xi(T - 1))), (12) 

i=i 

— Z" 00 I (70 \ 

g(aUt), K)= L <j>(0) d9, (13) 

depending explicitly on p(T — 1), <r 2 (T — 1), and A(T — 1). The dependence on the variables A(i), 
t = 0, . . . , T — 2 is implicit through the probability distribution of the observations Y(T — 1) and the 
recursions in ©. The constraints in (fTTb actually represent a continuum of constraints since they are 
required to be satisfied for all realizations of Y(T — 1). 



III. Effort allocation policies 

In this section, we develop policies directed at solving the effort allocation problem (fTTb . Optimal 
policies are discussed in Section IIII-AI while a less complex method known as open-loop feedback 
control is discussed in Section IIII-Bi We then discuss two approaches to improving the performance of 
OLFC: generalized OLFC in Section UlI-CI and policy rollout in Section Ull-Di 



A. Optimal policies 

In principle, it is possible to employ exact dynamic programming to determine an optimal policy for 
(fTTb . The dynamic programming approach decomposes (fTTb into a sequence of optimizations proceeding 
backward in time, making use of the chain rule Ey(t-i) = ^V(i) Ey(2)|Y(i) • • • ^ y (T-i)\Y(T-2) an( l tne 
fact that each allocation A(t) is a function of past observations Y(i) but not future ones. The last-stage 
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optimization is given by 



Jt-a (xfT — 1)) = min 

A(T-l) 
S.t. 



G(x(T-l),A(T-l)) 

iV 

$>(T-1) = A(T-1), 
i=i 

Aj(T-l)>0 Vi, 



(14a) 



and for i = T — 2, T — 3, . . . , 0, the optimizations are denned recursively as follows: 

■#(*(*)) = mm E y(m) {J? +1 (x(i + 1)) | x(i), A(t)} 

TV (14b) 

s.t £Ai(t)<A(t), A,(t)>0Vi 
i=i 

The desired optimal cost in (fTTT ) is Jq(x(0)). The notation in (|14b| ) reflects the fact that the distribution 
of y(t+ 1) given Y(i) is completely determined by x(i) and A(i); more specifically, f(yi(t+ 1) | Y(i)) 
is given by the denominator of the right-hand side of (l8al ) as can be seen from d36l ). The next state 
x(t + 1) is specified by x(i), A(i), and y(i + 1) through ([8]). Thus the choice of X(t) depends on Y(i) 
only through the state x(i), which is a property of dynamic programs ll24l . 

An optimal policy can be obtained by first solving (114al) for A(T — 1) and then using the result in 
d!4b| ) to solve for A(T — 2). The remaining allocations are determined in the same recursive way. This 
exact procedure is computationally tractable in a few cases. For T = 1, it suffices to solve (I14al ). which 
is a convex optimization problem under some conditions to be discussed in Section IIII-BI For T = 2 and 
a uniform prior (pi(0) = po, /Ui(0) = fio, of (0) = Oq), symmetry allows the initial allocation A(0) to be 
restricted to the form A(0) = j3^ 2 \G)l, where 1 denotes a vector with unit entries. Thus (114bb becomes a 
one-dimensional optimization with respect to the multiplier f3^ 2 \o). For fixed /3^(0), the expectation in 
d!4b| ) can be evaluated by sampling from the distribution of y(l) and then solving (|14ab for the resulting 
values of the state x(l). 

For T > 2 however, an exact solution via (1 14ab and (1 14bb is very difficult. Problem (1 14bb is in general 
an Af -dimensional optimization with no known structure and A^ potentially large. Furthermore, a common 
approach in which the cost-to-go J t * +1 (x(t+l)) in d 14bl > is summarized by its values at a small number of 
representative states x(t + 1) is made difficult by the high dimension (3A r + 1 to be exact) and continuous 
nature of the state. For these reasons, we do not consider an exact solution to (fTTT ) for T > 2, opting 
instead for an approximate method as is discussed next. 
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B. Open-loop feedback control 

A well-known approach to approximate dynamic programming is that of open-loop feedback control 
(OLFC) |[24ll . We consider the problem of determining the allocation \(t) at time t given the current set of 
observations Y(i), or equivalently the state x(i). In OLFC, this computation is simplified by assuming 
that future allocations \(t + 1), . . . , A(T — 1) can depend only on Y(t) and not future observations. 
In light of this assumption, the only quantities that depend on y(t + 1), . . . , y(T — 1) in (fT2l) are the 
probabilities Pi(T — 1). The expectations E y ( t+1 )| Y (t) • • • ^ y (T-i)\Y(T-2) m CCD can tnen be applied to 
transform pi(T — 1) into Pi(i) using ( fTOl ) repeatedly. The resulting cost function is to be optimized with 
respect to \{t), . . . , A(T — 1) jointly, leading to the problem 

N I T-l \ 

i-l V / 

T-l AT 

S - L EE A 'W=AW, Ai(r)>0Vr,i, 

r=t i=l 

where we have made use of a rearrangement similar to (©. The budget constraint in ( fT5l ) is assumed to 
be met with equality as otherwise the cost could be decreased. 

For t = T—l, the OLFC problem (031 ) coincides with the last-stage optimization in (1 14al) . For t < T—l, 
OLFC represents a significant simplification relative to the exact optimization in (1 14bb because the cost 
function in ( fT51 ) is expressed explicitly without the need to evaluate expectations recursively. Under certain 
conditions specified in the following proposition, problem (fT31) is also a convex optimization and thus 
can be tractably solved. 

Proposition 1: The OLFC problem (fT31) is a convex optimization problem if the loss function L is 
non-decreasing, g(af(t),hi) in (fT3T ) is a convex function of hi for hi > and all of (i), and the effort 
function h is concave. 

Proof: Since the constraints in (fT31) are all linear, the feasible set is convex (more precisely a simplex). 
The cost function is a non-negative combination of functions g(af(t), hi) with hi = J2r=t ^(^i( r ))» so 
it suffices to prove that g is convex as a function of \i(t), . . . , Aj(T — 1). First note that hi, as a sum of 
concave functions, is concave in \i(t), . . . , Aj(T — 1). Given that L is a non-decreasing function of its 
argument, g is seen to be a non-increasing function of h{. Furthermore, we may extend the definition of g 
to negative hi by letting g(af(t), hi) = oo for hi < 0, thereby preserving the monotonicity and assumed 
convexity of g. It then follows from a property of compositions of functions E51 that g is convex in 

\i(t),...,Xi(T-i). m 

The assumptions in Proposition Q] are not difficult to satisfy. It was already assumed in Section UTI that 
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L is non-decreasing so that the optimal amplitude estimate 0j is equal to the conditional mean /Xj(T). 
The concavity assumption on h is satisfied by the identity function as well as functions corresponding 
to a sublinear dependence of the observation precision on sensing effort. The convexity assumption on 
g is satisfied by a variety of commonly used loss functions. As a first example we consider the 0-1 loss 
function for a tolerance e, 

{0, < a < e, 
1, a > e. 

The integral in ( fT3l) may be evaluated in this case to yield 

g(af(t),h t ) = Q (y**/ot(t)+Ti?) , 

where Q denotes the Q-function, i.e., the standard Gaussian tail probability. Since the Q-function is 
convex decreasing for non-negative arguments and the square root function is concave in hi, the same 
property used in the proof of Proposition Q] may be invoked to conclude that g is a convex function of 
hi. The convexity of g can also be verified for L(a) = 1 — e~ ha with b > 0, which can be regarded as a 
continuous approximation to the 0-1 loss function. 

The assumption that g is convex may be replaced by one of the following stricter but more easily 
checked conditions: 

(a) L (a/ ' \J a 2 /af(t) + h^j is a convex function of hf, 

(b) L is convex. 

Condition (a) implies that g is convex because scaling the argument of L by 9 does not affect convexity 
and because the weighting function (f)(9) in the integral in (fl"3T ) is always positive. Condition (b) implies 
condition (a) because of a composition property similar to the one used earlier and the convexity of 
a / \J a 2 /af(t) + hi with respect to hi. If L is twice differentiable, condition (a) can be shown to be 
equivalent to the inequality 

aL"(a) + 3L'(o) > 0, a > 0, (16) 

whereas (b) is equivalent to L"(a) > 0. Condition (b) includes the square loss L(a) = a? corresponding 
to MSE, the linear loss L(a) = a corresponding to mean absolute error (MAE), the Huber loss which 
combines the square and linear losses in a continuous and convex manner, and the two-sided hinge loss. 
More generally, ( fT6l ) is satisfied for any power-law function L(a) = a q with q > and for L(a) = 
log(l + ba) with b > 0. Note that a q for < q < 1 and log(l + ba) are concave functions of a. Taking 
the limit as q — > of the power-law functions yields the 0-1 loss function, which was shown earlier to 
result in a convex g. 
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In the remainder of the paper, we assume that the assumptions of Proposition Q] are satisfied and hence 
the OLFC problem (fl3T ) is a convex optimization. We now address the solution of (fT5T ). The cost function 
in ( fT51 ) depends on \(t), . . . , A(T — 1) only through the quantities hi = J2r=t M^i( r ))> an d * s more 
specifically a non-increasing function of hi as argued in the proof of Proposition [TJ Therefore (031 ) may 
be solved via a two-step procedure: first we fix Aj(t) = J2 T =t A ?( r ) an d see k to maximize hj as functions 
of Aj(i), i.e., 



T-1 



= A m w r _ 1) 51 /l ( A ^( r )) 



T-1 



s.t. ^ Ai(r) = Xi(t), 



(17) 



Ai(r)>0 VT,i, 

and then we substitute the maximum values /i*(Aj(i)) into (fT3T ) and optimize with respect to Aj(t). The 
maximum h*(Xi(t)) can be determined by noting that (flTl) is a concave maximization problem subject 
to a simplex constraint. For such problems, we have the following necessary and sufficient optimality 
condition: 

if A*(r)>0 then -^r-^^rr, VrVr, (18) 

where the partial derivatives are evaluated at the optimum. The solution A*(r) = Xi(t)/(T — t) for all 
r satisfies (fl~8T > by symmetry since all of the partial derivatives are equal. The corresponding maximum 
value is therefore h*(\~i(t)) = (T - t)h(\i(t)/(T - t)). Note however that the optimal solution to (fTTT ) 
may not be unique if h is not strictly concave. In particular, if h is the identity function, then hi = Xi(t) 
regardless of the choice of A«(i), . . . , \i(T — 1). We return to the issue of non-uniqueness in Section 

EEC] 

With the substitutions Er=t H T ) = and Er=t H x i( T )) = i T ~ t)h(Xi(t)/(T - t)), O 

simplifies to 



min $>i(*)9 [af(t),(T-t)h 



_ / j r h\" /a I * V / ' \ / I rr-> # I I 

\ 

N 

s.t. ^A,(t) = A(t), Ai(t)>0Vi, 

i=l 

a simplex-constrained convex minimization problem. Problem ( fT9l ) thus satisfies an optimality condition 
similar to (TT8T ) with the inequality between partial derivatives reversed in direction. This condition implies 
that optimal solutions to ( fT9l ) have certain properties akin to water-filling. First, the solutions exhibit 
thresholding in the sense that X*(t) must be zero if the corresponding partial derivative is not among 
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the lowest. Second, the partial derivatives corresponding to non-zero components must all be equal. This 
in turn induces an ordering among the non-zero allocations as a function of the probabilities pi(t) and 
variances of (i). 

To illustrate the properties of optimal solutions to ( fT9l ), we specialize to the case of power-law losses 
L(a) = a q and the identity effort function h(X) = A. In this case, ( fT9l ) reduces to 

mm > 

' \ " /". \< > r -'Vi ' ) / ■ 

(20) 

N v ' 



N 

s.t. ^A i (t) = A(t), Ai(t)>0Vi 



i=i 



and the optimal solution can be stated explicitly. A detailed derivation is provided in Appendix First 
we define 7 = 2/(q + 2) and it to be an index permutation that sorts the quantities pj(t)af(t) in 
non-increasing order: 

P£(i) (*)<!)(*) ^ ^(2)(*K(2)(*) > • • > vl iN) (t)al {N) (t). (21) 

Next define b[k) to be the monotonically non-decreasing function of k = 0, 1, . . . , N with b(N) = 00 
and 

= 3 ^ E p2( (*) " E fc = 0, . . . ,AT — 1. (22) 

Then the optimal solution A*(t) to (1201 is given by 

^Bh"' 1 " 1 ""^' ' ' (23) 



where 



0, i = fe + l,...,JV, 

C = E — (24) 

S=iP^-)(t) 

and the number of non-zero components k is determined by the interval (b(k — l),b(k)] to which the 
budget parameter A(t) belongs. The monotonicity of b{k) ensures that the mapping from A(t) to k is 
well-defined. We note that k and C could also be computed using the general procedure in [26]. The 
thresholding property is clearly seen in (l23l) . Furthermore, the non-zero allocations increase with the 
probabilities Pi(t) raised to the power 7 and decrease with the precisions l/af(t). 

In the case of general loss and effort functions, ( fT9l ) may not have an explicit solution as in (|2TI)-(|241. 
Nevertheless, an efficient iterative solution is possible under the assumption of convexity. One possibility 
is to use a projected gradient algorithm, taking advantage of the ease of projecting onto a simplex. 
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The solution to ( fT9l ) specifies the values of the sums Aj(i) = 2~2r=t ^i( T )- However, the solution 
to (fTTT ) may not uniquely specify the division of Aj(i) into Aj(i), . . . , Aj(T — 1) if the effort function 
h is not strictly concave. In addition, since the OLFC optimization (fT9l ) is similar to the last-stage 
optimization (I Hal ), the resulting policy can be somewhat aggressive in allocating effort to components 
currently believed to contain signal as opposed to waiting for further confirmation. In the next subsection, 
these issues are addressed through a generalization of the OLFC approach. 

C. Generalized open-loop feedback control 

In this subsection, we discuss two modifications to the OLFC policy in Section IIII-BI The first 
modification is directed at optimizing the distribution of effort over stages and applies to all loss 
functions. The second modification reduces premature exploitation and is presented only for power-law 
loss functions; similar strategies could be devised for other loss functions. 

To optimize the allocation of effort over stages, we restrict the allocation for the current stage X(t) to 
be proportional to the optimal solution A*(i) of OH), i-e., A(i) = /3 (T) (t)A*(t), where /3 (T) (t) G [0,1] 
represents the fraction of the remaining budget A(t) that is used at time t and the superscript T denotes 
the total number of stages. The fractions ^ T \t) are chosen based on a generalization of the optimal 
policies for T = 1 and T = 2 in Section IIII-AI Both of these optimal policies belong to the OLFC 
class. Specifically, the T = 1 policy results from solving (114al ). which is a special case of ( fT9l ) with 
t = T - 1, and setting /3W(0) = 1 since there is only one stage. The T = 2 policy uses an initial 
allocation A(0) = /3 (2) (0)1, which is of the same form as the solution to (fT9l ) for t = under a uniform 
prior, followed by the solution to (fT9l ) for t = 1 scaled by /?( 2 )(1) = 1. Note that the second stage in the 
T = 2 policy is identical to the T = 1 policy with /3 (2) (1) = /3 (1) (0). For T > 2, we follow the same 
strategy of reusing the (T — l)-stage fractions in the T-stage policy, setting (3^ T \t) = /3( T_1 ^(t — 1) for 
t = 1,2, . . . ,T — 1. The first-stage fraction (3^ T \0) is then optimized as described below. 

The second modification is to allow the exponent 7 in (|2~TI)-(|241) to vary with time. The last-stage 
exponent ^ T \T — 1) is set to 2/(g + 2), the optimal exponent for the loss function L(a) = a q . In earlier 
stages, smaller exponents are used to make the policy more conservative, specifically by weakening the 
dependence on the probabilities Pi(t). We propose the simple strategy of optimizing only the first-stage 
exponent 7^ (0) and constraining the remaining exponents to linearly interpolate between 7^ (0) and 
<y(- T )(T — 1). This reduces the determination of the fractions fi^ T \t) and exponents ^ T \t) to a two- 
dimensional optimization regardless of the number of stages. 

The first-stage parameters /3^(0) and 7^(0) are determined recursively for T = 1,2,... starting 
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from (3^(0) = 1 and 7 (1) (0) = 2/(q + 2). Define J t (T) (x(t)) to be the cost-to-go of a T-stage policy in 
this family starting from time t and state x(t). Then for T > 1, p( T \o) and 7(^(0) are given by 

(> T )(0),7 {T) (0)) = arg min E y(1) |j{ T) (x(l)) | x(0),/3A*(0)} . (25) 

7<2/(9+2) 

The parameters /3 (T) (1), • • • ,/? {T) ( T - 2) and 7 (T) (1), • • • ,7 (T) ( T ~ 2 ) required to evaluate j} T) are 
specified by the (T — l)-stage policy and the choice of 7. The expectation in (|25T ) can be computed 
by sampling from the distribution of y(l), determining the state x(l) using ([8]), and then simulating 
the remainder of the policy. All of these computations can be done offline since they depend only on 
the initial state x(0) and previously determined policies. In addition, since the optimization in (l25l) can 
partially account for the effect of future observations on future allocations, an effect that is ignored in 
the OLFC simplification, the optimization over (3 is performed even in the case of strictly concave h. 
Otherwise, (fT71) would yield a uniform distribution over stages corresponding to P^ T \t) = 1/(T — t). 

The family of generalized OLFC policies defined above satisfies the following monotonic improvement 
property. 

Proposition 2: The cost of the generalized OLFC policies is non-increasing in the number of stages, 
i.e., 

J<S T) (x(0))< jf-^CxCO)), T = 2,3,.... 
Proof: The cost of the T-stage policy is given by 

jf ) (x(0))= mjn E y(1) {jf)(x(l)) | x(0), /3A*(0)} . (26) 

7 <2/(g+2) 

Consider fixing /3 = and 
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^ T = 2 > 

^7^(0)-^, T>2 
on the right-hand side of (l26l ). With /3 = 0, the observations y(l) are not taken, the state x(l) is 
unchanged from x(0), and the budget usage fractions are the same as in the (T — l)-stage policy. It can 
also be seen from the choice of 7 that the exponents are the same as for T — 1, and hence the right-hand 
side of d26l ) reduces to Jq (x(0)). The claim then follows. ■ 
Proposition [2]implies in particular that the generalized OLFC policies for T > 2 improve upon the optimal 
policy for T = 2. The corresponding performance gains are quantified through numerical simulations in 
Section |IVJ 
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D. Rollout OLFC policies 

We now discuss a different approach to improving the performance of OLFC based on the dynamic 
programming technique of policy rollout |[24l . For simplicity, we assume that the exponent 7 in (l2TT)-(l24l) 
is fixed to 2/(q + 2) in all stages, unlike in Section BlI-CI In this subsection only, we also make the 
same assumption for the generalized OLFC policies, i.e., the only parameter optimized in (1251 ) is f3^ T \0). 
Rollout could also be applied in the case of time-varying 7 by changing the optimization over (3(t) in 
(l27l) below to a joint optimization over and 7(t). 

In the last stage t = T — 1 of a rollout policy, the allocation is determined as before by solving dl4a| ), 
or equivalently by solving (031 ) with budget usage fraction ^{T - 1) = 1 (we use a tilde to distinguish 
the rollout fractions from those in the generalized OLFC policies). For t = 0, 1, . . . , T — 2, the fraction 
j3^ T \t) is determined according to 

^ T )(i)=arg mm E y(t+1) { + 1)) | x(i),/?(i)A* (t)} , (27) 

where J^(x(t + 1)) is the cost-to-go of the T-stage generalized policy. Thus ^ T \t) is chosen assuming 
that future stages follow the generalized policy. The corresponding cost-to-go J t+ { (x(i+l)) can be viewed 
as an approximation to the optimal cost-to-go J t * +1 (x(i + 1)) in d 14bb . Comparing (|2VT > with (f25j (and 
assuming that ^(t) = 2/(q + 2) for all t), it is seen that f3^ T \0) = (3^(0). In other stages however, the 
rollout fractions differ from those in the corresponding generalized policy because they are re-optimized 
based on the value of the current state x(t) instead of being taken directly from a policy with fewer 
stages. 

In general, rollout policies have the property of improved performance over the policies on which they 
are based. The same holds for the present rollout policy, with the difference being that the optimization 
in (1271 ) is restricted to a line search over /3(t). Denoting by J^ T ^(x(t)) the cost-to-go of a T-stage rollout 
policy starting from time t and state x(t), we have the following result: 

Proposition 3: The T-stage rollout OLFC policy has a lower cost-to-go than the corresponding gen- 
eralized OLFC policy in all stages and states, i.e., 

j; {T) (x(t)) < jf ^(t)), t = 0,...,T-l, Vx(t). 

Proof: The proof is based on ll24l Sec. 6.4]. For t = T — 1, the two policies coincide so the costs-to- 

~fT*) (T) 

go are the same. Assume inductively that J t+ ^(x(i + l)) < J t+ -[(x(i + 1)) for all x(t + l). The cost-to-go 
of the rollout policy is defined by 

jP(x(t)) = E y(t+1) {j?2(x(i + 1)) I x(t)J^(t)T(t)} 
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and similarly for the nested policy. By the induction hypothesis and the definition of the rollout policy 

jf } (x(t)) < E y(t+1) { jS(x(t + 1)) | x(t), P<r>{t)T{t)} 
< E y(t+1) { jS(x(i + 1)) | x(t),^ T Ht)T(t)} 
= J t (T) (x(i)) 

for all x(t) as required. Note that the second inequality depends on the generalized policy being included 
in the class over which the rollout policy is optimized. ■ 
The rollout OLFC policies can make greater use of knowledge of the state x(i) but are consequently 
more demanding computationally than the generalized OLFC policies. Instead of a single optimization 
in (|25T ). T — 1 optimizations as in (|27T ) are required. Furthermore and in contrast to (|25T ). (|27T ) must 
be evaluated online since it depends on the current state x(t). The simulations involved in computing 
the expectation in (|27T ) do become shorter however as t increases. The improvement due to rollout is 
characterized through numerical simulations in Section JV] 

IV. Numerical simulations 

Numerical simulations are used to evaluate the OLFC policies developed in Section JII] The monotonic 
improvement property of Proposition [2] is verified and gains up to several dB are observed relative to the 
optimal two-stage policy. The proposed policies are also seen to consistently outperform distilled sensing 
(DS), most significantly at higher SNR. 

In the simulations, we set N = 10000 and generate signals and observations according to the model 
in Section JI] Except where indicated, the signal mean fiQ is normalized to 1 and the signal standard 
deviation o"o is set to 1/4. 

Two families of generalized OLFC policies are considered, one optimized for MSE (final exponent 
j(T)(r - 1) = 1/2, denoted OLFC-MSE) and the other for MAE {j {T \T - 1) = 2/3, denoted OLFC- 
MAE). The number of stages T is varied from 2 to 10 and the final estimate is given by /x(T). In the 
offline determination of the parameters f3^ T \t) and j^(t), the optimization in ( f2Sb may be inexact 
because of finite-sample approximations to the expectations. To mitigate such errors, we make use of 
the empirical observation that (0) and (0) appear to vary smoothly with SNR, and j3^ (0) also 
appears to decrease monotonically with T. Accordingly, we first obtain raw estimates of /3^ T \0) and 
7^ T ) (0) and then perform a polynomial fit as a function of SNR, where the polynomials for /?( T ) (0) are 
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constrained to satisfy /3 (T) (0) > /?( T+1 )(0) for all T. In our experience, a polynomial degree of 6 is 
sufficient to capture the variation of the parameters over the SNR range considered. 

For the rollout OLFC policies, the fractions (t) in (f2Tb are also determined through finite-sample 
approximations to expectations and are thus subject to the same type of error. The difference as noted 
in Section IIII-DI is that (T27T ) must be evaluated online, and hence the number of samples is limited by 
computational constraints. To circumvent this tradeoff, we again make use of an empirical smoothness 
property, this time of the expectation in (|2"7T ) as a function of /3(t). Approximations to the expectations 
are first obtained using a relatively small number of samples, and a fourth-order polynomial in /3(t) is 
fit to the approximation. The polynomial fit is then minimized to determine f3^ T \t). Note that /3(i) = 1 
corresponds to a single-stage policy whose cost can be computed exactly from the current state x(£) as 
described in Appendix ICl Thus f3(t) = 1 and its corresponding single-stage cost represent a fixed point 
that constrains the polynomial fit. 

For DS, while [10 ] prescribes a single value for T as a function of the dimension JV, in our simulations 
we consider all values of T between 2 and 10 as with OLFC. Following ifTOl , we use a geometrically 
decreasing allocation of effort over stages with decay ratio 3/4 and equal first and last stages. More 
precisely, defining a(t) as the fraction of the total budget used in stage t, we have a(t) = a(0)(3/4)* 
for t = 1, . . . , T - 2, a(T - 1) = a(0), and q(0) chosen such that J^Jjo «(*) = 1 - 

In Fig. [T] we plot the MSE and MAE for various policies as a function of SNR, where SNR is 
defined as 101og 10 (/io/cr 2 ) in dB. Each point represents the average of 4000 simulations. The baseline 
corresponding to dB on the vertical axis is the optimal non-adaptive policy, which under a uniform 
prior allocates one unit of effort to all components. For context, we also include the oracle policy, which 
distributes effort uniformly over the true signal support. The oracle thus provides an upper bound on the 
achievable performance, although the bound is unlikely to be tight at lower SNR. 

In general, adaptivity yields higher gains for sparser signals (po = 0.01) since resources can be 
concentrated on fewer components once the support is identified. The 10-stage generalized OLFC policies 
improve upon the 2-stage OLFC policies as expected. The largest gains occur at intermediate SNR and 
reach 1.5 dB for p = 0.1 and 4.5 dB for p Q = 0.01. Recall that the 2-stage OLFC-MSE policy is optimal 
in terms of MSE for T = 2, and similarly for OLFC-MAE. Note also that the performance is only slightly 
affected by a mismatch between the OLFC policy and the loss function. At high SNR, the OLFC policies 
approach the oracle gain, which in turn approaches the sparsity factor 1/po- In contrast, the DS policies 
saturate at significantly lower levels since they are not designed with estimation performance in mind. 
While the 10-stage DS policy outperforms the optimal 2-stage policy at lower SNR, the 10-stage OLFC 
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Fig. 1. Reduction in MSE (first row) and MAE (second row) relative to non-adaptive estimation as a function of SNR. The 
10-stage generalized open-loop feedback control (OLFC) policies improve upon the 2-stage OLFC policies with maximum gains 
of 1.5 dB for po = 0.1 and 4.5 dB for po = 0.01. The 2-stage OLFC policies are optimal for T — 2. As the SNR increases, 
the proposed OLFC policies approach the oracle gain of 1/po and outperform distilled sensing (DS) by several dB. 



policies have the best performance at all SNR. 

Fig. [2] shows decreases in MSE with the number of stages T. The incremental gains predicted by 
Proposition [2] diminish as T increases. Using more stages is more beneficial at lower SNR and higher 
sparsity, whereas at higher SNR most of the signal components can be located in a single step and a 
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Fig. 2. MSE reduction as a function of the number of stages T. Gains diminish as T increases but less quickly at lower 
SNR and higher sparsity. In all cases shown, the proposed OLFC-MSE policy with 5 stages performs better than a 10-stage DS 
policy. 



two-stage OLFC policy performs almost as well as a policy with many more stages. The gains for DS 
do not diminish as quickly but are lower overall, never exceeding the gain of the corresponding 5-stage 
generalized OLFC policy. 

In Fig. [3l we consider the performance improvement due to policy rollout, as guaranteed by Proposition 
U For this experiment only, the exponent 7 in (|2~T1)-(|2"41) is fixed at 2/(q + 2) = 1/2 (q = 2 for MSE). 
The dimension N is lowered to 1000 and the results are averaged over only 1000 simulations because of 
the higher computational complexity of rollout. For pn = 0.1 in Fig. |3(a)| no decrease in MSE is seen, 
whereas for p$ = 0.01 in Fig. |3(b)[ the decrease is never more than 0.6 dB. It appears therefore that for 
the problem at hand, the performance gained from rollout is minimal while the computational cost of 
the required online simulations is much greater. 

Fig. |4] depicts the fraction a^ T \t) of the total budget allocated to each stage in a 10-stage OLFC-MSE 
policy for different SNR levels and po = 0.01. The fractions a^ T \t) are related to the fractions ^ T \t) 
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of the remaining budget through a straightforward transformation. Three regimes may be distinguished 



in Fig. |4(a)| At very low SNR, it is difficult to identify the signal support and the allocation is close to 
uniform. Between and 25 dB SNR, the allocation is heavily weighted toward earlier stages. As seen in 
Fig. |4(b)[ the decrease with time is reminiscent of the geometric decay prescribed by distilled sensing. 
Above 25 dB SNR, the support can be determined with relatively little effort and an increasing fraction 
of the budget is reserved for the last stage to exploit this knowledge. 

The proposed policies are based on a Bayesian framework and are thus dependent on prior knowledge 
of the expected sparsity level and SNR, specifically in the form of the parameters po, no, and a 2 . If 
these prior parameters are misspecified, the correct values can be learned through the Bayesian update 
process ([8]) but some degradation in performance is to be expected. To assess the effect of mismatched 
priors on the generalized OLFC policies, a series of experiments are conducted in which one of po, fio, 
or a 2 is misspecified. In Fig. |5(a)[ the true sparsity level po is 0.1 while the value p' assumed by the 
policies is either 0.1 or 0.01. The performance loss of the OLFC-MSE policies is rather mild given the 
order-of-magnitude underestimate of pq. Similar results are seen when po is overestimated. DS on the 
other hand does not make use of the parameter p' and is therefore unaffected. 

In Figs. |5(b)l and 5(c) po is set to 0.01 while the signal mean [i' assumed by the policies is either correct 
or off by ±4 dB. The signal standard deviation <r is also changed to 0.40, making the mean mismatches 
on the order of one standard deviation. As can be seen from (1851 , a misspecification of jiQ leads to a 
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Fig. 4. Fraction of total budget allocated to each stage in a 10-stage OLFC-MSE policy for po = 0.01 and (a) all SNR values, 
(b) SNR = 0, 10, 20, 30 dB. Three regimes can be seen in (a): a nearly uniform regime below dB SNR, a decaying "distilling" 
regime between and 25 dB, and a near-oracle regime above 25 dB with increasing emphasis on the last stage. 



biased estimate /x(T), although the bias can be reduced by allocating more effort to the observations. 
It is clear from Figs. |5(b)[ c) that overestimating fiQ results in more significant losses due to missed 
detections of weaker than expected signal components, especially at high SNR. In contrast, when fiQ is 
underestimated, the reduction in MSE relative to nonadaptive sampling can actually be greater than in the 
matched case; this can be attributed to a reduction in bias. In both cases, the OLFC-MSE policy remains 
better than DS. The consequences of misspecifying a 2 are less severe than for /j,q with underestimating 
a 2 being worse. These findings suggest that the policies are more sensitive to overestimates of the SNR 
than underestimates. 



V. Application to radar imaging 

In this section, the proposed allocation policies are applied to a radar imaging example also considered 
in O. The original synthetic aperture radar (SAR) image [27] shows 13 tanks in a large field and is 
therefore sparse in terms of targets. In the adaptive setting, it is assumed that the position and dwell 
time of the radar beam can be controlled, and our goal is to illustrate the benefits of such adaptivity in 
acquiring sparse targets. 

We assume a Swerling II target model, commonly used in radar [28], in which the observation Zi(t) 
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Fig. 5. Reduction in MSE relative to non-adaptive estimation as a function of SNR under mismatches in prior parameters. In 
(a), po is underestimated by an order of magnitude and the effects are minor. More severe losses are seen in (b) with a 4 dB 
overestimate of po- In (c), /io is underestimated by 4 dB and the losses are again modest. In all cases, the proposed OLFC-MSE 
policy remains better than DS. 



of location % in stage t is given by the empirical mean 

1 K,(t-1) 
K i\ l ~ L ) s= i 

where the Zi s (t) are i.i.d. exponential random variables with mean equal to the true target amplitude Xj, 
and Ki(t — 1) is the number of radar pulses. The total budget consists of NP pulses and the average 
number of pulses per location P is thus equivalent to SNR. 

The Swerling observation model presents a test of robustness of the policies to non-Gaussianity. Results 
obtained under Gaussian and speckle noise are similar. In addition, several accommodations are made 
to better conform to the model in Section ITT] Most notably, while the targets in Fig. 6(a) are indeed 
sparse, they have non-negligible spatial extent and their amplitudes are not uniformly different from the 
background. Therefore each observed image z(t) is preprocessed with a matched filter, following the 
approach in Q and using the same approximate tank template as in [ 3 . Fig. 6]. The effort allocation 
policies are then applied with the filtered images y(t) as input. We use po = 0.001 as the initial sparsity 
estimate in the filtered domain. The initial signal mean {1q and variance ctq are estimated from the first- 
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stage observations yj(l) above the 1 — po quantile, while the background mean (generally nonzero) and 
variance a 2 are estimated from the yi(l) below the 1 — po quantile. Once the allocation X(t) has been 
determined, it is mapped to a pulse allocation n(t) in the unfiltered domain by convolving \(t) as an 
image with the support of the tank template (a binary image) and normalizing so that J2i = J2i 
The allocation n(t) is then rounded to satisfy the integer restriction, again while preserving the sum. 
The reconstructed image x is formed as a maximum-likelihood estimate of x based on z(l), . . . , z(T): 

Et=i«i(*) 

In the non-adaptive single-stage case, this reduces to Xi = Zi(l) with «j(0) = P in (|28T ). Fig. [6] shows a 
120 x 120 portion of the original image (the full 450 x 570 image is used in processing) together with 
reconstructions from P = 3 pulses per location. We focus attention on the targets of interest, namely 
the tanks. In the non-adaptive reconstruction in Fig. |6(b)[ the tanks are obscured by noise. Better images 
result from the two adaptive policies. The OLFC reconstruction however shows greater noise suppression 
around each tank and recovers amplitude details more faithfully. 

In Fig. |7J we show one-dimensional profiles passing through the line of tanks. The middle curves 
indicate the true image intensities while the upper and lower curves correspond to one standard deviation 
above and below the mean reconstruction for each policy, where the mean and standard deviation are 
computed from 100 realizations. The number of pulses per location is P = 2. The variability in the 
reconstruction is clearly reduced using OLFC, in particular in the higher-amplitude regions corresponding 
to targets. The 5-stage OLFC policy further reduces the standard deviation by 2-3 dB relative to the 2- 
stage OLFC policy. 

VI. Conclusions and future work 

We have presented multistage resource allocation policies for the sequential estimation of sparse signals 
under a variety of loss and effort functions. Our formulation of the problem permits the application of 
techniques from dynamic programming, in particular open-loop feedback control. The proposed policies 
improve monotonically with the number of stages and thus extend the optimal two-stage policy developed 
in 0. Simulations and a radar imaging example also show gains relative to distilled sensing [10] and 
dramatic improvements relative to non-adaptive sensing. 

The dynamic programming approach taken in this paper is quite general and can potentially be leveraged 
to develop tractable policies for other inference tasks such as detection or a combination of detection and 
estimation. More general observation models involving linear combinations may also be incorporated; the 
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matched filtering in the radar example in Section [V] is only a preliminary step in this direction. On the 
more theoretical side, the performance curves in Fig. Q] motivate the need for bounds on the achievable 
performance of adaptive sensing that are more refined than the oracle bound. Results in this vein for the 
case of a discrete resource budget have appeared recently |29l . 

Appendix A 

Derivation of posterior probability distributions 

This appendix presents the derivation of posterior probability distributions related to the observation 
model in £T|) and the specification of a dynamic program state. Attention is paid to the adaptive nature 
of the observations, specifically the dependence of the sensing effort \(t) on past observations Y(t). 

First we derive the conditional distribution f(0 [ I, Y(t)). This can be done inductively starting with 
t = 0, in which case there are no observations and f(0 | I, Y(f)) is given by the assumed independent 
Gaussian prior: 

N N 

f(0 | I) = Hm | h) =IJ^i;tt(0) ) <7?(0)). (29) 

1=1 1=1 

Next we assume that f(0 | I,Y(i — 1)) is given and use Bayes' rule to obtain the proportionality 

f(0 | I, Y(i)) oc /(y(t) | 0,I,Y(t - l))f{0 | I, Y(t - 1)) (30) 

as functions of 0. Since conditioning on Y(t - 1) also fixes Aj(t — 1) in £T|), the observations yi(t) are 

conditionally independent and Gaussian and the likelihood term f(y(t) \ 0,I,Y(t — 1)) simplifies to 

JV 

f(y(t) | 0, 1, Y(t - 1)) = II 0(l/<(*); v 2 /h(\i(t - 1))). (31) 

8=1 

From d29ll— d3Tb it can be seen that | I, Y(t) retains an independent Gaussian distribution for all t with 
marginals given by 

f(0i | I h Y(t)) a (Kvifalidi^/hMt - l)))f(9i | I h Y(t - 1)). (32) 

For Ii = 0, Qi is independent of Y(t) and f(9i \ Ii = 0, Y(t)) is given by the ith term in (|29l ). For 
Ii = 1, we parameterize f(9i \ Ii = l,Y(i)) by its mean /J,i(t) and variance (7j 2 (t) as in Section HTl 
A straightforward calculation based on (l32l shows that fii(t) and o"j 2 (t) obey the recursions in (l8bl and 
(|8cT ). Solving (f8cT > for the final-stage variance yields Q. 

We now consider the conditional probability mass function p(I j Y(t)), proceeding inductively as 
before. The case t = corresponds to the prior distribution, assumed to be i.i.d. Bernoulli: 

N N 

p(I) = Y[p(Ii) = nPi(0) /4 (l -^(O)) 1 - 1 '. (33) 

i=l i=l 
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Next we relate p(I \ Y(t)) to p(I | Y(t — 1)) using Bayes' rule: 

,(I I Y(t)) = /(y(*)li.Y(*-i))p(i|Y(t-i)) 

As before, conditioning on Y(i — 1) fixes Aj(t — 1) in CD) and thus y(i) | I, Y(i— 1) is a linear combination 
of the independent Gaussian random vectors | I, Y(t — 1) and n(i). Consequently we obtain 

N 

f(y(t) | I, Y(t - 1)) = II J rf " !)> (* - 1) + ^ 2 A(A 4 (t - 1))), (35) 

i=l 

recalling that | = 1, Y(t — 1)) is parameterized by — 1) and af(t — 1). From (f33T>— (l35T> it 
can be concluded that the components of I | Y(t) remain independent with marginal distributions 

The recursion for pi(t) = Pr(Ii = 1 | Y(t)) in (EaJ) follows from (O and (l36l) . 

Appendix B 

Optimality of the mean estimator for symmetric non-decreasing loss functions 

In this appendix, we show that the expected loss in estimating a random variable 8 is minimized by 
the mean /x of 9 when the loss function is symmetric and non-decreasing and the probability density 
f(9) is symmetric and strictly unimodal, as in the Gaussian case. First we prove the statement for loss 
functions of the form 

0, < a < 5, 

L s (a) = { (37) 

1, a > 5 



E 



u 



/0—S roo 
f{6)d0+ I f(0)d9. (38) 
-oo Je+8 



for 5 > 0. The expected loss for an estimate 9 is then 

9— S roo 

le+5 

The derivative of the expected loss with respect to 9 is f(9 — 5) — f(9 + 5) and is strictly positive for 
9 > /x, strictly negative for 9 < fi, and zero only at 9 = \i given the assumptions of symmetry and strict 
unimodality. Thus the estimate 9 = n is optimal for the loss function Lg. 

A general non-decreasing loss function L can be approximated arbitrarily closely by a sum of functions 
of the form in (I37T ) in a manner reminiscent of Lebesgue integration. Given a step size AL > 0, we 
construct the approximation 

oo 

L{a) = AL L L -i, kAL )(a), 
k=l 
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where L~ x {kAL) denotes the smallest value of a such that L(a) = kAL. By the linearity of expectations, 
the expected value of L (^9 — # ^ is a sum of functions of the form in (138T ). Since 9 = fj, minimizes each 
term in the sum individually, it also minimizes the overall sum and hence the mean estimate is optimal 
for L. As AL — y 0, L converges to L and the statement is proven for L. 

Appendix C 
Solution of problem (|2Qb 

For notational simplicity, we write pi, Ti, \, and A in this appendix for the quantities Pi(t), a 2 /af(t), 
Xi(t), and A(i) in (l20l) . We also use J to denote the cost function. As noted in Section BlI-BI (l20l) 
is a convex minimization problem subject to a simplex constraint and therefore satisfies an optimality 
condition similar to (fLSt : 

81 81 

if A,*>0 then _(A*)<— (A*) Vj^i (39) 



Condition (1391 implies that the optimal solution to (1201 ) satisfies an index rule in the sense that the non- 
zero components of the optimal solution correspond to the largest pj/ri, where 7 = 2/(q + 2). To prove 
this fact, suppose that i and j are such that A* > and A| = but p] /r« < p'J/rj. Then 

dJ _ g Pi _1JEi_ > q p i - ®£. 

d\i ~ 2(r, + A*)V7 2 r f/T ~ 2 ~ OA/ 



contradicting the optimality condition ( [39l . The index rule can be stated in terms of the permutation tt 
defined in (|2T1) . which in the notation of this appendix sorts the quantities pj /r{ in non-increasing order. 
Specifically, we have A* ^ > for i = 1, . . . , k for some integer k, A* ^ = for i = A; + 1, . . . , TV, and 

^(fc)/ r -(fc) > Pl( k +i)/ r n(k+i) strictly. 

The optimality condition ( f39b also implies that the partial derivatives corresponding to non-zero 
components of the optimal solution must be equal. Hence 

_2_8J = P,ii) v 

9^(0 KW+^ (! )) 1/7 
where C is a constant to be determined. A slight rearrangement of (l40l yields the expression in (l23l 

for i = 1, . . . , k. The value of C in (l24l) is obtained by summing d23l ) over « = 1, . . . , k and noting that 

i=i — 2^1=1 A i — A - 

It remains to determine the cutoff index k. This can be done by enforcing the condition A*^ > 
for i = 1, . . . , k and the optimality condition (|39l for j = ir(k + 1), . . . , ir(N) (corresponding to the 
zero-valued components). The first condition is equivalent to 

'V(i) . .. , 

O ^> ?y , Z = 1 , . . . , /C, 

^7r(i) 
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while the second is equivalent to 

C<^-, i = k + l,...,N. (41) 

Pn(i) 

Given the definition of it in (f2TT >. the most stringent conditions correspond to i = k and i = k + 1, i.e., 

Mfc) < A + E*=i M») < r ^(fc+i) (42) 
^(fc) EfU pity Pn(k+i) ' 

upon substituting (1241 . Solving (1421 for A yields the condition — 1) < A < b[k) using the definition 
of b(k) in d22l) . If A; = N, (|4T1 ) is absent and we only have the condition A > b(N — 1), or equivalently 
we may define b(N) = oo. Thus the number of non-zero components k is determined by the interval 
(b(k — 1), b(k)] to which A belongs. This mapping from A to k is well-defined if b(k) is a non-decreasing 
function of k so that the intervals (b(k — l),b(k)] are non-overlapping and span the positive real line. 
Indeed we have 



Pn(k+l) i=l ' ' i=l 
r k k 

i= i i= i 
fc-1 fc-1 

_ 1 7r(fc) \ ^ -y _ \ ^ 

^TT(fc) j=l i= l 
= b(k~l), 

where the inequality is due to (1211 . 
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original non-adaptive, P = 3 pulses 




20 40 60 80 100 120 20 40 60 80 100 120 

(c) (d) 

Fig. 6. Portion of original image (a) in radar imaging example and reconstructions from P = 3 pulses per location allocated 
non-adaptively (b), using 5-stage DS (c), and using 5-stage OLFC-MSE (d). OLFC suppresses noise more strongly around each 
tank and recovers details more faithfully. 
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non-adaptive, P = 2 pulses DS, P = 2 pulses, T = 5 stages 




Fig. 7. One-dimensional profiles passing through the line of tanks in Fig. |6(a)| Middle curves indicate the true image intensities 
while upper and lower curves correspond to one standard deviation above and below the mean of 100 reconstructions for each 
policy using P = 2 pulses per location. The 5-stage OLFC policy reduces the standard deviation by a further 2-3 dB relative 
to the 2-stage OLFC policy. 



